Big linked cancer data: Integrating linked TCGA and PubMed

نویسندگان

  • Muhammad Saleem
  • Maulik R. Kamdar
  • Aftab Iqbal
  • Shanmukha Sampath
  • Helena F. Deus
  • Axel-Cyrille Ngonga Ngomo
چکیده

The amount of bio-medical data available on the Web grows exponentially with time. The resulting large volume of data makes manual exploration very tedious. Moreover, the velocity at which this data changes and the variety of formats in which bio-medical data is published makes it difficult to access them in an integrated form. Finally, the lack of an integrated vocabulary makes querying this data more difficult.In this paper, we advocate the use of Linked Data to integrate, query and visualize bio-medical data. The resulting Big Linked Data allows discovering knowledge distributed across manifold sources, making it viable for the serendipitous discovery of novel knowledge. We present the concept of Big Linked Data by showing how the constant stream of new bio-medical publications can be integrated with the Linked Cancer Genome Atlas dataset (TCGA) within a virtual integration scenario. We ensure the scalability of our approach through the novel TopFed federated query engine, which we evaluate by comparing the query execution time of our system with that of FedX on Linked TCGA. Then, we show how we can harness the value hidden in the underlying integrated data by making it easier to explore through a user-friendly interface. We evaluate the usability of the interface by using the standard system usability questionnaire as well as a csutomized questionnaire designed for the users of our system. Our overall result of 77 suggests that our interface is easy to use and can thus lead to novel insights.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fostering Serendipity through Big Linked Data

The amount of bio-medical data available over the Web grows exponentially with time. The large volume of the currently available data makes it difficult to explore, while the velocity at which this data changes and the variety of formats in which bio-medical is published makes it difficult to access them in an integrated form. Moreover, the lack of an integrated vocabulary makes querying this d...

متن کامل

Zodiac: A Comprehensive Depiction of Genetic Interactions in Cancer by Integrating TCGA Data.

BACKGROUND Genetic interactions play a critical role in cancer development. Existing knowledge about cancer genetic interactions is incomplete, especially lacking evidences derived from large-scale cancer genomics data. The Cancer Genome Atlas (TCGA) produces multimodal measurements across genomics and features of thousands of tumors, which provide an unprecedented opportunity to investigate th...

متن کامل

Linked Functional Annotation For Differentially Expressed Gene (DEG) Demonstrated using Illumina Body Map 2.0

Semantic Web technologies are core for the integration of disparate data resources. It can be used to exploit data from next generation sequencing (NGS) for therapeutic decisions regarding cancer. In this manuscript, we describe how different data resources, which inform on the expression of specific genes in a tissue and its variants, can be brought together to indicate a risk for tissue-speci...

متن کامل

TopFed: TCGA tailored federated query processing and linking to LOD

BACKGROUD The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to catalogue genetic mutations responsible for cancer using genome analysis techniques. One of the aims of this project is to create a comprehensive and open repository of cancer related molecular analysis, to be exploited by bioinformaticians towards advancing cancer knowledge. However, devising bioinfo...

متن کامل

مونتاژ ژنها، روش جدیدی برای شناسایی جهشهایی ژنتیکی، کاربردی اساسی برای بررسی مولکولی ژنهای پیچیده مرتبط با سرطان ارثی پستان

Background : Most of the offending genes of diseases are quite big and complex with varieties of exons. Gene montage is a new technique for formation of a big linked DNA segment that could be easily detected by DNA sequencing or Denaturing High Performance Liquid Chromatography (DHPLC). Methods : Exons 2,20,23 and 24 of BRCA1 gene were linked and analyzed by DNA sequencing. Exons 2 and 20 are i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Web Sem.

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2014